Welcome to reshaping data in R using dplyr and tidyr. You’ll learn how to convert between wide and long formats, when to use each, and how to do calculations more efficiently with long-format data.
Prerequisites
Before diving into the analysis, let’s load the necessary R packages. These packages will help us manipulate data efficiently.
library(dplyr) # For data manipulation
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(tidyr) # For reshaping datalibrary(here) # For building file paths
here() starts at C:/IMF-R-Book
Understanding Wide and Long Formats
The same dataset can be written in two different formats: wide and long.
Wide Format
In a wide format, values do not repeat in the first column. This format works well if you have a time series of one variable for various countries, as each country’s data for different years can be spread across multiple columns.
In a long format, values do repeat in the first column. This format is more efficient when you have multiple variables for different countries, as it allows for easier manipulation and analysis.
year country consumption investment exports imports
1 2016 BRA 5000 1500 1000 800
2 2016 MEX 4000 1200 2000 1000
3 2016 USA 15000 5000 6000 3000
4 2017 BRA 5100 1550 1100 850
5 2017 MEX 4100 1250 2100 1050
6 2017 USA 15500 5100 6200 3100
7 2018 BRA 5200 1600 1200 900
8 2018 MEX 4200 1300 2200 1100
9 2018 USA 16000 5200 6400 3200
10 2019 BRA 5300 1650 1300 950
11 2019 MEX 4300 1350 2300 1150
12 2019 USA 16500 5300 6600 3300
Reshaping Data
Let’s start by reshaping a dataset from wide to long format. We’ll use the example dataset that contains GDP components (consumption, investment, exports, and imports) for multiple countries.
Wide to Long
To reshape the wide data data into long format with columns year, country, consumption, investment, exports, and imports, we’ll use the pivot_longer() function from the tidyr package: